If caffeine is one of the most popular drugs, then coffee is likely one of the most popular delivery systems for it. Aside from caffeine, people enjoy the wonderful variety of coffee-related drinks. Let’s do a rough investigation of the “market share” by two of the top coffee chains in the United States!
World Population Review provides some great data on store locations and chain prevalence. Check out this page for the Starbucks Coffee locations in the United States. Notice that this page only really gives the name of the state and the number of locations in that state.
Scrape the Location Counts
Use the beautifulsoup library to scrape the data (from the link above) on state names and corresponding number of store locations, for the following chains:
Starbucks
Dunkin’ Donuts
Starbucks
Code
!pip install requests beautifulsoup4 pandas
Requirement already satisfied: requests in /opt/anaconda3/lib/python3.12/site-packages (2.32.2)
Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/lib/python3.12/site-packages (4.12.3)
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.12/site-packages (2.2.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2024.8.30)
Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/lib/python3.12/site-packages (from beautifulsoup4) (2.5)
Requirement already satisfied: numpy>=1.26.0 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (1.26.4)
Requirement already satisfied: requests in /opt/anaconda3/lib/python3.12/site-packages (2.32.2)
Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/lib/python3.12/site-packages (4.12.3)
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.12/site-packages (2.2.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.12/site-packages (from requests) (2024.8.30)
Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/lib/python3.12/site-packages (from beautifulsoup4) (2.5)
Requirement already satisfied: numpy>=1.26.0 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/lib/python3.12/site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Find the revenue, stock price, or your financial metric of choice for each of the companies listed above (if you can find a website to scrape these from that’s great!…but it’s okay if you manually enter these). Merge these values into your big dataset. Note: these values may be repeated for each state.
Code
# Starbucks Revenue from 2021 and 2023# https://stockanalysis.com/stocks/sbux/revenue/# Starbucks Revenue from 2024# https://stories.starbucks.com/press/2024/starbucks-reports-preliminary-q4-and-full-fiscal-year-2024-results/# Starbucks Revenue Data (Manually entered)starbucks_revenue = {'2021': 29.06, # In billions of USD'2023': 35.98,'2024': 36.2}# Dunkin Donuts Revenue from 2023# https://www.zippia.com/dunkin-donuts-careers-554008/revenue/# # Dunkin Donuts Estimated Revenue from 2023# https://companiesmarketcap.com/dunkin-brands/revenue/# Dunkin' Revenue Data (Manually entered)dunkin_revenue = {'2023': 1.4, # In billions of USD'2024': 1.25}starbucks_revenue_df = pd.DataFrame(list(starbucks_revenue.items()), columns=['Year', 'Yearly Starbucks Revenue'])dunkin_revenue_df = pd.DataFrame(list(dunkin_revenue.items()), columns=['Year', 'Yearly Dunkin Revenue'])starbucks_revenue_df['Year'] = starbucks_revenue_df['Year'].astype(str)dunkin_revenue_df['Year'] = dunkin_revenue_df['Year'].astype(str)coffee_population_df = pd.merge(coffee_population_df, starbucks_revenue_df, on='Year', how='left')coffee_population_df = pd.merge(coffee_population_df, dunkin_revenue_df, on='Year', how='left')coffee_population_df
State
Year
Starbucks Stores
Dunkin Donuts Stores
Population 2020
Yearly Starbucks Revenue
Yearly Dunkin Revenue
0
Alabama
2021
99
NaN
5024279
29.06
NaN
1
Alabama
2023
85
59.0
5024279
35.98
1.40
2
Alabama
2024
0
69.0
5024279
36.20
1.25
3
Alaska
2021
49
NaN
733391
29.06
NaN
4
Alaska
2023
49
0.0
733391
35.98
1.40
...
...
...
...
...
...
...
...
148
Wisconsin
2023
145
83.0
5893718
35.98
1.40
149
Wisconsin
2024
0
100.0
5893718
36.20
1.25
150
Wyoming
2021
26
NaN
576851
29.06
NaN
151
Wyoming
2023
23
1.0
576851
35.98
1.40
152
Wyoming
2024
0
1.0
576851
36.20
1.25
153 rows × 7 columns
Code
starbucks_revenue = {'2021': 29.06,'2023': 35.98,'2024': 36.2}dunkin_revenue = {'2023': 1.4,'2024': 1.25}# Total number of stores for each companytotal_starbucks_stores =39304total_dunkin_stores =19027# Calculate average revenue per store for each year (in million USD)starbucks_revenue_per_store = { year: (revenue *1e9) / total_starbucks_stores for year, revenue in starbucks_revenue.items()}dunkin_revenue_per_store = { year: (revenue *1e9) / total_dunkin_stores for year, revenue in dunkin_revenue.items()}coffee_population_df['Year'] = coffee_population_df['Year'].astype(str)coffee_population_df['Estimated Starbucks Revenue'] = coffee_population_df.apply(lambda row: round(row['Starbucks Stores'] * starbucks_revenue_per_store.get(row['Year'], 0), 2), axis=1)coffee_population_df['Estimated Dunkin Revenue'] = coffee_population_df.apply(lambda row: round(row['Dunkin Donuts Stores'] * dunkin_revenue_per_store.get(row['Year'], 0), 2), axis=1)coffee_population_df
State
Year
Starbucks Stores
Dunkin Donuts Stores
Population 2020
Yearly Starbucks Revenue
Yearly Dunkin Revenue
Estimated Starbucks Revenue
Estimated Dunkin Revenue
0
Alabama
2021
99
NaN
5024279
29.06
NaN
7.319713e+07
NaN
1
Alabama
2023
85
59.0
5024279
35.98
1.40
7.781142e+07
4341199.35
2
Alabama
2024
0
69.0
5024279
36.20
1.25
0.000000e+00
4533032.01
3
Alaska
2021
49
NaN
733391
29.06
NaN
3.622888e+07
NaN
4
Alaska
2023
49
0.0
733391
35.98
1.40
4.485599e+07
0.00
...
...
...
...
...
...
...
...
...
...
148
Wisconsin
2023
145
83.0
5893718
35.98
1.40
1.327371e+08
6107110.95
149
Wisconsin
2024
0
100.0
5893718
36.20
1.25
0.000000e+00
6569611.60
150
Wyoming
2021
26
NaN
576851
29.06
NaN
1.922349e+07
NaN
151
Wyoming
2023
23
1.0
576851
35.98
1.40
2.105485e+07
73579.65
152
Wyoming
2024
0
1.0
576851
36.20
1.25
0.000000e+00
65696.12
153 rows × 9 columns
Create a region variable in your dataset according to the scheme on this wikipedia page: Northeast, Midwest, South, West. You do not need to scrape this information.
7. Assess and comment on the prevalence of each chain. Some questions to consider (you don’t need to answer all of these and you may come up with your own):
Are some of these chains more prevalent in certain states than others? Possibly despite having less stores overall? Same questions for regions instead of states.
from plotnine import*( ggplot(state_brand_data, aes(x='State', y='num_stores', fill='Brand')) + geom_bar(stat='identity', position='stack') + labs(title='Average Number of Stores by State and Brand', x='State', y='Average Number of Stores') + theme_minimal() + theme( legend_title=element_text(text='Coffee Brand'), axis_text_x=element_text(rotation=90, hjust=1, size=8), figure_size=(16, 8), subplots_adjust={'bottom': 0.3} ))
/opt/anaconda3/lib/python3.12/site-packages/plotnine/themes/themeable.py:2419: FutureWarning: You no longer need to use subplots_adjust to make space for the legend or text around the panels. This paramater will be removed in a future version. You can still use 'plot_margin' 'panel_spacing' for your other spacing needs.
State-Level Insights:
Starbucks, with a total of 39,304 stores, and Dunkin’ Donuts, with 19,027 stores, exhibit distinct regional patterns shaped by their historical roots and market strategies. Starbucks, founded in Seattle, Washington on the West Coast, has a much stronger presence in California, Washington, and Texas. The brand’s dominance in these states, particularly California with over 3,000 stores, reflects the influence of the West Coast’s coffee culture on Starbucks’ expansion. In contrast, Dunkin’ Donuts, founded in Quincy, Massachusetts, shows a significant concentration of stores in the Northeast, including Massachusetts, New York, New Jersey, and Connecticut. These states are clear examples of the brand’s strong regional presence, with Dunkin’ Donuts often outnumbering Starbucks despite having fewer stores overall. Massachusetts, in particular, stands out as a Dunkin’ stronghold, aligning with the brand’s origins and deep-rooted popularity among locals.
In states such as Georgia, Florida, and Illinois, both Starbucks and Dunkin’ Donuts maintain a relatively balanced presence, indicating active competition in these markets. However, both brands have limited or no presence in sparsely populated states like Montana, North Dakota, and Wyoming, showing that population density and market demand play key roles in expansion strategies. Starbucks’ dominance in the West and South regions, alongside Dunkin’s concentration in the Northeast, highlights how these brands have grown along historically significant lines. Despite Dunkin’ Donuts having fewer stores overall, it has maintained a competitive edge in the Northeast through strong brand loyalty and regional identity. These patterns emphasize how both Starbucks and Dunkin’ Donuts have built their market presence by leveraging their regional origins and aligning with local consumer preferences.
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_bar : Removed 4 rows containing missing values.
Regional-Level Insights
The regional distribution of Starbucks and Dunkin’ Donuts reveals distinct patterns in their market presence, influenced by their brand origins and strategic expansion. In the Northeast, Dunkin’ Donuts dominates with the largest number of stores, aligning with its roots in Quincy, Massachusetts. The brand has maintained strong regional loyalty, reinforcing its prominence in this region. Starbucks, though present in the Northeast, has a comparatively smaller footprint.
In contrast, Starbucks holds a commanding presence in the West, where it was founded in Seattle, Washington. The West Coast’s coffee culture has contributed to Starbucks’ significant market penetration, while Dunkin’ Donuts has a minimal presence in this region.
In the South and Midwest, both brands exhibit strong competition, although Starbucks maintains a slight edge in both regions. Starbucks’ broader market appeal and association with urban and suburban environments have allowed it to expand effectively in these regions. Dunkin’ Donuts, despite having fewer stores overall compared to Starbucks, has managed to secure a competitive position in the South and Midwest, likely leveraging its reputation for quick-service coffee and breakfast.
This analysis highlights how both brands capitalize on their regional strengths—Dunkin’ Donuts in the Northeast and Starbucks in the West—while competing in other markets with varying degrees of success. Starbucks, with its larger global footprint, maintains a stronger presence in most regions, but Dunkin’ Donuts’ deep-rooted loyalty in the Northeast ensures its dominance in that region despite having fewer stores overall.
How does your chosen financial metric change by state and region for each chain? For example, having 5 stores in California is very different from having 5 stores in Wyoming.
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_bar : Removed 51 rows containing missing values.
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_bar : Removed 50 rows containing missing values.
The financial metric used in this analysis—estimated state-level revenue for Starbucks and Dunkin’ Donuts—is based on each company’s total yearly revenue divided by the number of stores in each state, as individual state/store revenues are not publicly reported. The results show that Starbucks dominates state-level revenue across nearly all states, largely due to its significantly higher revenue, ranging from $29 billion to $36 billion between 2021, 2023, and 2024. In comparison, Dunkin’ Donuts reported a far smaller revenue of $1.4 billion to $1.25 billion in 2023 and 2024, leading to Starbucks outperforming Dunkin’ even in states where both brands are present.
Since the revenue estimates are proportional to the number of stores per state, states with a larger number of locations—such as California, Texas, and Florida—naturally show higher total revenues. However, Starbucks still generates more revenue even in states with fewer stores, like Alaska, reflecting its broader market presence and higher national revenue per store. Regional differences also play a role. Starbucks maintains a strong presence in urban and coastal areas—particularly in states like California, New York, and Washington—resulting in higher estimated revenues in those regions. In contrast, Dunkin’ Donuts’ strongest presence is in the Northeast, with states like Massachusetts, New York, and New Jersey contributing most of its revenue. However, even in these regions, Starbucks’ total revenue outpaces Dunkin’s due to its larger market share and financial scale.
Given these dynamics, adding five stores in California would be more profitable than adding five stores in Wyoming. The higher population density, customer base, and spending patterns in California drive more sales per store, further amplifying the impact of Starbucks’ brand strength and customer loyalty. Ultimately, Starbucks’ dominant brand value and broader market reach make it the clear leader in state-level revenue across the U.S. The company’s higher overall revenue and more extensive geographic presence give it an advantage, even when store counts between the two brands are similar in certain states. Dunkin’ Donuts, while successful in select regions, cannot match Starbucks’ financial impact at the national or state level due to lower overall revenue and more limited geographic penetration. This analysis highlights the significant gap between the two brands, with Starbucks consistently outperforming Dunkin’ in terms of revenue, regardless of the number of stores in a particular state.
Does the distribution of each chain’s stores match population distribution, by both state/region?
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 50 rows containing missing values.
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 50 rows containing missing values.
The distribution of Starbucks stores aligns closely with population across regions, showing a positive correlation where states with larger populations tend to have more stores. While there is a positive relationship between population size and the number of stores in all regions, Starbucks shows greater consistency in expanding its stores across states with varying population sizes. This suggests that Starbucks follows a broader market penetration strategy, reaching even less populated areas. In contrast, Dunkin’ Donuts’ distribution is more concentrated in specific regions, particularly the Northeast, where it has established brand loyalty. Dunkin’ has a smaller presence in the West and South, focusing more heavily on core markets rather than national distribution. Starbucks, with a more diverse geographic strategy, maintains higher store counts across a wider range of regions and population sizes.
Do the financial data match what you’d expect based on the number and locations of the stores? Why or why not?
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:284: PlotnineWarning: stat_ydensity : Removed 50 rows containing non-finite values.
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
The financial data aligns with my expectations, showing that Starbucks generates higher revenue due to its broad geographic presence across multiple regions. Starbucks performs well not only in dense regions like the South and West but also in more distributed areas, reflecting a scalable business model. In contrast, Dunkin’ Donuts focuses primarily on the Northeast, limiting its national revenue potential. The density plots highlight that Dunkin’s regional concentration results in fewer stores and lower revenue outside its core markets. Overall, the financial performance of both companies reflects their distinct business strategies—Starbucks with national expansion and Dunkin’ with regional dominance.
Automate
Convert your code for Exercises 1-3 above to a function that takes a single argument: the URL. This function should
Scrape the information on state names and corresponding number of store locations on the webpage specified (assume the page has a table in the same form and placement as the ones you scraped above)
Extract the name of the company from either the URL specified or the webpage (assume the URL will have the same format as the ones used above)
Return a clean, organized and tidy dataset. Find a page other than Starbucks and Dunkin’ Donuts to test this on to confirm that it works. It’s fine if this is not related to coffee.
Code
import requestsfrom bs4 import BeautifulSoupimport pandas as pddef scrape_store_data(url): response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") table = soup.find("table", attrs={"class": "wpr-table"})if table isNone:print("Table not found")returnNone company_name = url.split('/')[-1].replace('-by-state', '').replace('-', ' ').title() rows = []for row in table.find_all("tr")[1:]: cells = row.find_all("td")iflen(cells) >0: state = row.find('th').get_text(strip=True) locations = [cell.get_text(strip=True).replace(',', '') for cell in cells]for i, loc inenumerate(locations):try: year =str(2024- i) store_count =int(loc)exceptValueError: store_count =0 rows.append({'State': state,'Year': year,f'{company_name} Locations': store_count }) df = pd.DataFrame(rows)return df
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:284: PlotnineWarning: stat_ydensity : Removed 50 rows containing non-finite values.
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
/opt/anaconda3/lib/python3.12/pprint.py:632: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
/opt/anaconda3/lib/python3.12/site-packages/plotnine/layer.py:284: PlotnineWarning: stat_ydensity : Removed 50 rows containing non-finite values.
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals
/opt/anaconda3/lib/python3.12/site-packages/plotnine/positions/position.py:232: PlotnineWarning: position_dodge requires non-overlapping x intervals